自主代理需要自定位才能在未知环境中导航。他们可以使用视觉进程(VO)来估计自我运动并使用视觉传感器定位自己。作为惯性传感器或滑板作为轮编码器,这种运动估算策略不会因漂移而受到损害。但是,带有常规摄像机的VO在计算上是要求的,它限制了其在严格的低延迟, - 内存和 - 能量要求的系统中的应用。使用基于事件的相机和神经形态计算硬件为VO问题提供了有希望的低功率解决方案。但是,VO的常规算法不容易转换为神经形态硬件。在这项工作中,我们提出了一种完全由适合神经形态实现的神经元构件构建的VO算法。构建块是代表向量符号体系结构(VSA)计算框架中向量的神经元组,该框架是作为编程神经形态硬件的抽象层提出的。我们提出的VO网络生成并存储了对展示的视觉环境的工作记忆。它更新了此工作内存,同时估计相机的位置和方向的变化。我们证明了如何将VSA作为神经形态机器人技术的计算范式借用。此外,我们的结果代表了使用神经形态计算硬件进行快速和效率的VO以及同时定位和映射(SLAM)的相关任务的重要步骤。我们通过机器人任务和基于事件的数据集对实验进行了实验验证这种方法,并证明了最先进的性能。
translated by 谷歌翻译
在视觉场景理解中,推断对象的位置及其刚性转换仍然是一个开放的问题。在这里,我们提出了一种使用有效的分解网络的神经形态解决方案,该解决方案基于三个关键概念:(1)基于矢量符号体系结构(VSA)的计算框架,带有复杂值值矢量; (2)分层谐振器网络(HRN)的设计,以处理视觉场景中翻译和旋转的非交换性质,而两者都被组合使用; (3)设计多室尖峰拟态神经元模型,用于在神经形态硬件上实现复杂值的矢量结合。 VSA框架使用矢量结合操作来产生生成图像模型,其中绑定充当了几何变换的模棱两可的操作。因此,场景可以描述为向量产物的总和,从而可以通过谐振器网络有效地分解以推断对象及其姿势。 HRN启用了分区体系结构的定义,其中矢量绑定是一个分区内的水平和垂直翻译,以及另一个分区内的旋转和缩放的定义。尖峰神经元模型允许将谐振网络映射到有效且低功耗的神经形态硬件上。在这项工作中,我们使用由简单的2D形状组成的合成场景展示了我们的方法,经历了刚性的几何变换和颜色变化。同伴论文在现实世界的应用程序方案中为机器视觉和机器人技术展示了这种方法。
translated by 谷歌翻译
Large language models can perform new tasks in a zero-shot fashion, given natural language prompts that specify the desired behavior. Such prompts are typically hand engineered, but can also be learned with gradient-based methods from labeled data. However, it is underexplored what factors make the prompts effective, especially when the prompts are natural language. In this paper, we investigate common attributes shared by effective prompts. We first propose a human readable prompt tuning method (F LUENT P ROMPT) based on Langevin dynamics that incorporates a fluency constraint to find a diverse distribution of effective and fluent prompts. Our analysis reveals that effective prompts are topically related to the task domain and calibrate the prior probability of label words. Based on these findings, we also propose a method for generating prompts using only unlabeled data, outperforming strong baselines by an average of 7.0% accuracy across three tasks.
translated by 谷歌翻译
In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data. Basically, we design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores. We examine a range of recently proposed evaluation metrics based on pretrained language models, for the tasks of open-ended generation, translation, and summarization. Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics. For example, we find that BERTScore ignores truncation errors in summarization, and MAUVE (built on top of GPT-2) is insensitive to errors at the beginning of generations. Further, we investigate the reasons behind these blind spots and suggest practical workarounds for a more reliable evaluation of text generation.
translated by 谷歌翻译
Simulating rigid collisions among arbitrary shapes is notoriously difficult due to complex geometry and the strong non-linearity of the interactions. While graph neural network (GNN)-based models are effective at learning to simulate complex physical dynamics, such as fluids, cloth and articulated bodies, they have been less effective and efficient on rigid-body physics, except with very simple shapes. Existing methods that model collisions through the meshes' nodes are often inaccurate because they struggle when collisions occur on faces far from nodes. Alternative approaches that represent the geometry densely with many particles are prohibitively expensive for complex shapes. Here we introduce the Face Interaction Graph Network (FIGNet) which extends beyond GNN-based methods, and computes interactions between mesh faces, rather than nodes. Compared to learned node- and particle-based methods, FIGNet is around 4x more accurate in simulating complex shape interactions, while also 8x more computationally efficient on sparse, rigid meshes. Moreover, FIGNet can learn frictional dynamics directly from real-world data, and can be more accurate than analytical solvers given modest amounts of training data. FIGNet represents a key step forward in one of the few remaining physical domains which have seen little competition from learned simulators, and offers allied fields such as robotics, graphics and mechanical design a new tool for simulation and model-based planning.
translated by 谷歌翻译
In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).
translated by 谷歌翻译
神经算法推理的基石是解决算法任务的能力,尤其是以一种概括分布的方式。尽管近年来,该领域的方法学改进激增,但它们主要集中在建立专家模型上。专业模型能够学习仅执行一种算法或具有相同控制流骨干的算法的集合。相反,在这里,我们专注于构建通才神经算法学习者 - 单个图形神经网络处理器,能够学习执行各种算法,例如分类,搜索,动态编程,路径触发和几何学。我们利用CLRS基准来凭经验表明,就像在感知领域的最新成功一样,通才算法学习者可以通过“合并”知识来构建。也就是说,只要我们能够在单任务制度中学习很好地执行它们,就可以以多任务的方式有效地学习算法。在此激励的基础上,我们为CLR提供了一系列改进,对CLR的输入表示,培训制度和处理器体系结构,将平均单任务性能提高了20%以上。然后,我们进行了多任务学习者的彻底消融,以利用这些改进。我们的结果表明,一位通才学习者有效地结合了专家模型所捕获的知识。
translated by 谷歌翻译
生长免费的在线3D形状集合决定了3D检索的研究。然而,已经进行了积极的辩论(i)最佳输入方式是触发检索,以及(ii)这种检索的最终用法场景。在本文中,我们为回答这些问题提供了不同的观点 - 我们研究了3D草图作为输入方式,并提倡进行检索的VR-Scenario。因此,最终的愿景是用户可以通过在VR环境中自由空气供电来自由地检索3D模型。作为新的3D VR-Sketch的首次刺入3D形状检索问题,我们做出了四个贡献。首先,我们对VR实用程序进行编码以收集3D VR-Sketches并进行检索。其次,我们从ModelNet收集了两个形状类别的第一套$ 167 $ 3D VR-SKETCHES。第三,我们提出了一种新的方法,以生成不同抽象级别类似人类的3D草图的合成数据集,以训练深层网络。最后,我们比较了常见的多视图和体积方法:我们表明,与3D形状到3D形状检索相比,基于体积点的方法在3D草图上表现出卓越的性能,并且由于稀疏和抽象的性质而显示出3D形状的检索3D VR-Sketches。我们认为,这些贡献将集体成为未来在此问题的尝试的推动者。 VR接口,代码和数据集可在https://tinyurl.com/3dsketch3dv上找到。
translated by 谷歌翻译
我们介绍了1,497个3D VR草图和具有较大形状多样性的椅子类别的3D形状对的第一个细粒数据集。我们的数据集支持草图社区的最新趋势,以细粒度的数据分析,并将其扩展到主动开发的3D域。我们争辩说最方便的草图场景,其中草图由稀疏的线条组成,并且不需要任何草图技能,事先培训或耗时的准确绘图。然后,我们首次将细粒度3D VR草图的场景研究为3D形状检索,作为一种新颖的VR素描应用程序和一个探索基础,以推动通用见解以告知未来的研究。通过实验在这个新问题上精心选择的设计因素组合,我们得出重要的结论以帮助跟进工作。我们希望我们的数据集能够启用其他新颖的应用程序,尤其是那些需要细粒角的应用程序,例如细粒度的3D形状重建。该数据集可在tinyurl.com/vrsketch3dv21上获得。
translated by 谷歌翻译
我们研究基于3D-VR-Sketch的细粒度3D形状检索的实际任务。此任务特别令人感兴趣,因为2D草图被证明是2D图像的有效查询。但是,由于域间隙,很难从2D草图中以3D形状的检索获得强劲的性能。最近的工作证明了3D VR素描在此任务上的优势。在我们的工作中,我们专注于3D VR草图中固有的不准确性造成的挑战。我们观察到,带有固定边缘值的三胞胎损失获得的检索结果,通常用于检索任务,包含许多无关的形状,通常只有一个或几个或几个具有与查询相似的结构。为了减轻此问题,我们首次在自适应边距值和形状相似性之间建立联系。特别是,我们建议使用由“拟合差距”驱动的自适应边距值的三重损失,这是在结构保护变形下的两个形状的相似性。我们还进行了一项用户研究,该研究确认这种拟合差距确实是评估形状结构相似性的合适标准。此外,我们介绍了202个VR草图的数据集,用于从内存而不是观察到的202个3D形状。代码和数据可在https://github.com/rowl1ng/structure-aware-aware-vr-sketch-shape-retrieval中找到。
translated by 谷歌翻译